Emotions in Speech: Tagset and Acoustic Correlates

نویسنده

  • Sofia Gustafson-Capková
چکیده

In recent years the interest has grown, for automatically on the one hand detect and interpret emotions in speech, and on the other hand to generate certain emotions in speech synthesis. Areas where such knowledge might improve a system is e.g. dialogue/expert systems but also applications for disabled people. This report is a short survey of research within the field of emotions in speech. Special attention is paid to what categories are used for tagging speech corpora for emotions, and to what acoustic correlates these categories might be connected. 0 Introduction Much research has been done on the topic emotions in speech. Already in the early 20 century attempts were done to connect certain forms of speech to certain emotions (Armstrong & Ward, 1926). From that time and onward these attempts were repeated. The early research was rather carried out in the field of psychology, but with the introduction of speech synthesis and automatic speaker recognition (ASR), the more psychological branch is accompanied by a more technical application based approach. Many researchers in the field of speech technology have during the last decade worked on different aspects of emotions in speech. One of the goals is to make speech synthesis sound more natural, another goal is to be able to recognise the emotive state of a speaker in e.g. a dialogue system. One motivation for the claim that emotions are signalled in speech prosody, is experiments which have shown that subjects can recognise the emotive content in a speech sample, also when all word meaning is filtered out (e.g. Banse & Scherer, 1996, Brown, 1980, Mozziconacci, 1998, Pereira, 2000, Scherer, 1981, Soskin & Kauffman, 1961). I.e. with the sole information of intonation, subjects are able to recognise the emotion behind the utterance. Emotions colour the language, and can make meaning more complex. As listeners we also react to the speakers emotive state and adapt our behaviour depending on what kind of emotions the speaker transmit, e.g. we may try to show empathy to sad people, or if someone hesitates we try to make the person clarify what s/he means or wants. To classify the emotive state by a speaker on basis of the prosody and voice quality, we have to classify acoustic features in the speech as connected to certain emotions. This also implies the assumption that voice alone really carries full information about emotive state by the speaker. This assumption is often taken for granted, but e.g. Stibbard (2001) takes this assumption for questionable. Research in emotional speech has a long tradition, but in the recent years the need for applicable results in this field has become more important in more and more sophisticated automatic spoken language systems, like e.g. SmartKom (Batliner, et al. 2001). Access to the emotive information in speech could gain certain applications, i.e. it would be useful to be able to take into account whether the speaker in e.g. a dialogue system is frustrated, irritated or content. With recognition of emotions on basis of features in the speech signal, a system might be able to detect the emotive state by a person, and respond accordingly, as well as being able to speak in a way that people can feel comfortable with. Experiment results has shown that an agent who can signal empathy is significantly more effective than the control conditions, in helping relieve frustration levels (Klein, 1999). However, the findings in the area of acoustic correlates to emotions are not always encouraging, and also not too homogenous. Results point sometimes in contradictory directions, and it is hard to define what is valid data concerning emotive speech. In this paper I am going to give a brief survey over research in the field of emotive speech. After a survey of some earlier investigations I will focus specifically on three aspects: i) Speech data, ii) emotive categories and iii) tagsets for emotions. The outline of the paper is as following: In section 1. Choice of corpora I will examine what kind of data that have been used in different corpora and investigations. In section 2. Categories of emotion I will give account for psychological motivations to categorisation of emotions. I will discuss along what dimensions emotions can be classified, and also the distinction between emotions and attitudes. Section 3. Acoustic correlates consist of an overview of results from studies in the field of emotional speech, regarding the acoustic correlates of emotions. In section 4. Evaluations of tagsets I will give account for an evaluation of an existing tagset, Speech technology, term paper Autumn –01 Sofia Gustafson-Capková 2 developed on basis of spontaneous speech, and in section 5. Synthesis and recognition of emotions in speech I will give account for the results for some experiments with emotional speech synthesis. I will finish with section 6. Summary and Discussion. In the very end of the paper I have listed some links and resources that one can find on the www. The list is not meant to be exhaustive, but rather to give the interested reader a small sample of web pages of interest (please try the demos, they are really VERY funny!).

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Study of the Relationship between Acoustic Features of “bæle” and the Paralinguistic Information

Language users benefit from special phonetic tools in order to communicate linguistic information as well as different emotional aspects and paralinguistic information through daily conversation. Having functions in conveying semantic information to listeners, prosodic features form the essential part of linguistic behavour, manipulating  them potentially can play an important role in transmitt...

متن کامل

An Acoustic Study of Emotivity-Prosody Interface in Persian Speech Using the Tilt Model

This paper aims to explore some acoustic properties (i.e. duration and pitch amplitude of speech) associated with three different emotions: anger, sadness and joy against neutrality as a reference point, all being intentionally expressed by six Persian speakers. The primary purpose of this study is to find out if there is any correspondence between the given emotions and prosody patterning in P...

متن کامل

طراحی الگوریتم بازشناسی واجها با به کارگیری همبسته های آکوستیکی مشخصه های واجی

In the present paper, the phonological feature geometry of the Persian phonemes is analyzed in the form of articulate-free and articulate-bound features based on the articulator model of the nonlinear phonology. Then, the reference phonetic pattern of each feature that consists of one or a set of acoustic correlates, characterized by the quantitative or qualitative values in its phonological re...

متن کامل

Asc10. Study of Acoustic Correlates Associate with Emotional Speech

This study investigates the acoustic characteristics of four different emotions expressed in speech. The aim is to obtain detailed acoustic knowledge on how a speech signal is modulated by changes from neutral to a certain emotional state. Such knowledge is necessary for automatic emotion recognition and classification and emotional speech synthesis. Speech data obtained from two semi-professio...

متن کامل

The Function of Pitch Range Variations in Samples of Emotional Expressions in Persian

This study aims at investigating the interface between emotion and intonation patterns (more specifically, duration and pitch amplitude of speech). To this end, the acoustic properties of spectral parameters related to speech prosody are investigated. The results of acoustic and Statistical analysis show that mean level and range of FO in the contours vary strongly as a function of the degree o...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002